Recognizing voice over IP: a robust front-end for speech recognition on the world wide web

نویسندگان

  • Carmen Peláez-Moreno
  • Ascensión Gallardo-Antolín
  • Fernando Díaz-de-María
چکیده

The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation and optimization of noise robust front-end technologies for the automatic recognition of Hungarian telephone speech

In this paper a variety of front-end configurations are evaluated on Hungarian telephone speech databases. Our aim was to measure directly the efficiency of the front-ends on real noisy and normal speech data. As a baseline the ETSI ADSR standard front-end is used. Some simplification on the standard is introduced resulting in better performance on our databases than the original front-end in t...

متن کامل

Quantization of cepstral parameters for speech recognition over the World Wide Web

We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Intern...

متن کامل

R@7à3spgp3à7vh7pae7fr3dà8p3e7ugpcà8gpàr@7àh7p8gpe3f57 7t3ds3ragfàg8àqh775@àp75g9faragfàqwqr7eqàsf67pàfgaqw 5gf6aragfq @exƒ9¼x„i‚à@s‚ƒgr

This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used for the evaluation of front-end feature extraction algorithms using a defined HMM recognition back-end or complete recognition systems. The source speech for this database is the TIdigits, consisting of connected digits task spoken by America...

متن کامل

A study of mutual front-end processing method based on statistical model for noise robust speech recognition

This paper addresses robust front-end processing for automatic speech recognition (ASR) in noise. Accurate recognition of corrupted speech requires noise robust front-end processing, e.g., voice activity detection (VAD) and noise suppression (NS). Typically, VAD and NS are combined as one-way processing, and are developed independently. However, VAD and NS should not be assumed to be independen...

متن کامل

Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio

This paper proposes a front-end processing method for automatic speech recognition (ASR) that employs a voice activity detection (VAD) method based on the periodic to aperiodic component ratio (PAR). The proposed VAD method is called PARADE (PAR based Activity DEtection). By considering the powers of the periodic and aperiodic components of the observed signals simultaneously, PARADE can detect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Multimedia

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2001